SemanticScuttle - klotz.me » Tags: data engineering+python

Tags: data engineering* + python*

0 bookmark(s) - Sort by: Date ↓ / Title /

Building a Data Dashboard Using the Streamlit Python Library

This article introduces Streamlit, a Python library for building data dashboards, as a solution for Python programmers to create graphical front-ends without needing to delve into CSS, HTML, or JavaScript. The author, a seasoned data engineer, explains how Streamlit and similar tools enable the creation of attractive dashboards, marking a shift from traditional tools like Tableau or Quicksight. This piece serves as the first in a series focusing on Streamlit, with future articles planned on Gradio and Taipy. The author aims to replicate similar layouts and functionalities across dashboards using consistent data.

2025-01-21 Tags: streamlit, python, data, dashboard, data engineering, gradio, visualization, taipy by klotz

10 Pandas One-Liners for Quick Data Quality Checks

These one-liners provide quick and effective ways to assess the quality and consistency of the data within a Pandas DataFrame.

| Code Snippet | Explanation |
| --- | --- |
| `df.isnull().sum()` | Counts the number of missing values per column. |
| `df.duplicated().sum()` | Counts the number of duplicate rows in the DataFrame. |
| `df.describe()` | Provides basic descriptive statistics of numerical columns. |
| `df.info()` | Displays a concise summary of the DataFrame including data types and presence of null values. |
| `df.nunique()` | Counts the number of unique values per column. |
| `df.apply(lambda x: x.nunique() / x.count() * 100)` | Computes the percentage of unique values for each column. |
| `df.isin( value » ).sum()` | Counts the number of occurrences of a specific value across all columns. |
| `df.applymap(lambda x: isinstance(x, type_to_check)).sum()` | Counts the number of values of a specific type (e.g., int, str) per column. |
| `df.dtypes` | Lists the data type for each column in the DataFrame. |
| `df.sample(n)` | Returns a random sample of n rows from the DataFrame. |

2025-01-03 Tags: pandas, data quality, one-liners, data cleaning, python, data engineering by klotz

Efficient Testing of ETL Pipelines with Python

This article explains how to quickly detect data quality issues and identify their causes using Python for ETL pipelines. It discusses strategies to minimize the time required to fix data quality problems.

2024-10-07 Tags: etl, pipelines, data quality, python, tableau, data engineering, business intelligence by klotz

Simplifying the Python Code for Data Engineering Projects

This article provides Python tricks and techniques for data ingestion, validation, processing, and testing in data engineering projects. It offers practical solutions for streamlining the code, including tips for data validation, handling errors, and testing.

2024-06-13 Tags: python, data engineering by klotz

How moving from Pandas to Polars made me write better code (without writing better code)

An exploration of the benefits of switching from the popular Python library Pandas to the newer Polars for data manipulation tasks, highlighting improvements in performance, concurrency, and ease of use.

2024-07-13 Tags: pandas, polars, data engineering, python, dataframe by klotz

DuckDB: In-Process Python Analytics for Not-Quite-Big Data

An in-process analytics database, DuckDB can work with surprisingly large data sets without having to maintain a distributed multiserver system. Best of all? You can analyze data directly from your Python app.

2024-06-02 Tags: duckdb, python, analytics, database, big data, sql, panda_s, data engineering by klotz

Automating Data Pipelines with Python & GitHub Actions

An article discussing a simple and free way to automate data workflows using Python and GitHub Actions, written by Shaw Talebi.

2024-06-01 Tags: pipeline, python, github actions, machine learning, screwdriver, data engineering by klotz

Intro to Streamlit: Web-based Python data apps made easy | InfoWorld

Intro to Streamlit
- Simple and complex Streamlit example
- Data and state management in Streamlit apps
- Data widgets for Streamlit apps
- Deploying Streamlit apps

2024-04-17 Tags: streamlit, data engineering, python, llm by klotz

Bigquery DataFrame

2023-10-14 Tags: bigquery, pandas, python, data frame, gcp, data engineering by klotz

Reddit: Flink Scala deprecated

2023-01-07 Tags: apache, flink, data engineering, scala, java, python, reddit by klotz

First / Previous / Next / Last / Page 1 of 0